TextRunner: Open Information Extraction on the Web

نویسندگان

  • Alexander Yates
  • Michele Banko
  • Matthew Broadhead
  • Michael J. Cafarella
  • Oren Etzioni
  • Stephen Soderland
چکیده

Traditional information extraction systems have focused on satisfying precise, narrow, pre-specified requests from small, homogeneous corpora. In contrast, the TextRunner system demonstrates a new kind of information extraction, called Open Information Extraction (OIE), in which the system makes a single, data-driven pass over the entire corpus and extracts a large set of relational tuples, without requiring any human input. (Banko et al., 2007) TextRunner is a fullyimplemented, highly scalable example of OIE. TextRunner’s extractions are indexed, allowing a fast query mechanism. Our first public demonstration of the TextRunner system shows the results of performing OIE on a set of 117 million web pages. It demonstrates the power of TextRunner in terms of the raw number of facts it has extracted, as well as its precision using our novel assessment mechanism. And it shows the ability to automatically determine synonymous relations and objects using large sets of extractions. We have built a fast user interface for querying the results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Open Information Extraction for the Web

1 3 , 8 1 0 , 0 0 0 T u p l e s ? P r i m a r y E n t i t i e s ? R e l a t i o n s F i l t e r i n g Figure 4.2: Open Extraction from Wikipedia: TextRunner extracts 32.5 million distinct assertions from 2.5 million Wikipedia articles. 6.1 million of these tuples represent concrete relationships between named entities. The ability to automatically detect synonymous facts about abstract entities...

متن کامل

Open Information Extraction Using Wikipedia

Information-extraction (IE) systems seek to distill semantic relations from naturallanguage text, but most systems use supervised learning of relation-specific examples and are thus limited by the availability of training data. Open IE systems such as TextRunner, on the other hand, aim to handle the unbounded number of relations found on the Web. But how well can these open systems perform? Thi...

متن کامل

Semantic Role Labeling for Open Information Extraction

Open Information Extraction is a recent paradigm for machine reading from arbitrary text. In contrast to existing techniques, which have used only shallow syntactic features, we investigate the use of semantic features (semantic roles) for the task of Open IE. We compare TEXTRUNNER (Banko et al., 2007), a state of the art open extractor, with our novel extractor SRL-IE, which is based on UIUC’s...

متن کامل

Filtering Information Extraction via User-Contributed Knowledge

Large repositories of knowledge can enable more powerful AI systems. Information Extraction (IE) is one approach to building knowledge repositories by extracting knowledge from text. Open IE systems like TextRunner [Banko et al., 2007] are able to extract hundreds of millions of assertions from Web text. However, because of imperfections in extraction technology and the noisy nature of Web text...

متن کامل

Relational Web Search

Facts are naturally organized in terms of entities, classes, and their relationships as in an entity-relationship diagram or a semantic network. Search engines have eschewed such structures because, in the past, their creation and processing have not been practical at Web scale. This paper introduces the extraction graph, a textual approximation to an entity-relationship graph, which is automat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007